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1.  Research  Objective  and  Main  Result* 


The  goal  of  thia  research  is  to  develop  and  analyse  algorithms  which  can,  in  some  prac¬ 
tical  sense,  solve  NP-complete  problems  quickly.  NP-complete  problems  appear  in  many 
disciplines  such  as  Cryptology,  Operations  Research,  Artificial  Intelligence  and  Computer 
System  Design.  NP-complete  problems  are  the  “hardest”  of  a  class  of  problems  known  as 
NP.  Associated  with  each  NP  problem  we  consider  is  an  infinite  set  of  instances.  Instances 
may  take  the  form  of  graphs,  logic  expressions,  sets  or  many  other  structures  depending 
on  the  problem.  Each  instance  has  a  size  denoted  by  n.  Although  the  sire  of  an  instance 
[  may  be  formally  defined  as  the  number  of  bits  needed  to  efficiently  encode  /,  for  our 
purposes,  we  may  regard  the  site  of  /  to  be  the  number  of  distinct  objects  in  l.  So,  for 
example,  a  graph  containing  E  edges  and  Q  vertices  has  size  n  =  E  +  Q.  Associated  with 
each  instance  /  is  a  set  of  variables,  a  set  of  values  that  can  be  assigned  to  each  variable 
and  a  constraint  function  Uj  that  maps  value  assignments  to  variables  to  {true,  false). 
For  example,  if  /  is  a  graph  with  Q  vertices  we  might  associate  Q  -  1  variables  which 
take  edge  labels  as  values  and  a  constraint  function  which  lu>e  value  true  if  and  only  if 
the  edge  set  corresponding  to  the  assignment  given  to  the  variables  is  a  spanning  tree  of 
/.  An  assignment  (  such  that  Ui(l)  -  (rue  is  a  solution  to  /.  An  algorithm  solves  /  if  it 
determines  whether  or  not  a  solution  exists  for  /. 

A  problem  in  NP  is  said  to  be  solved  efficiently  if  there  is  an  algorithm  which  solves 
every  instance  of  the  problem  in  time  bounded  by  a  polynomial  in  n.  Unfortunately,  there 
is  no  known  computational  scheme  for  efficiently  solving  any  NP-complete  problem  and  it 
is  considered  highly  unlikely  that  one  will  be  found  (see  (2)  and  [18]).  Thus,  every  known 
method  for  solving  an  NP-complete  problem  P  cannot  find  the  solution  to  some  instances 
of  P  in  a  reasonable  amount  of  time.  Furthermore,  there  is  little  hope  that  even  an  effective 
randomized  algorithm  (see  [19],  [27]  and  [28])  will  be  found  for  any  NP-complete  problem 
since,  as  is  well  known,  this  would  imply  an  unlikely  collapse  of  the  polynomial  hierarchy. 
However,  if  a  method  A  can  be  found  to  efficiently  find  solutions  to  all  but  a  few  instances 
of  P  then  A  might  be  a  practical  method  for  solving  P.  We  are  interested  in  such  (.4,  P) 
pairs. 

We  use  probability  theory  to  measure  success  in  meeting  "ur  goal.  A  distribution  D  is 
assigned  to  the  set  of  all  possible  instances  «>f  P  of  size  n  and  we  prove  one  nf  three  kinds 
of  results  for  a  given  algorithm  .1- 

a)  A  finds  a  solution  to  an  instance  nf  p  chosen  randomly  according  to  D  in  lime  bounded 

by  a  polynomial  in  n  with  probability  greater  than  some  positive  constant  k  as  n  gels 

large.  Then  we  say  A  efficiently  solves  P  in  bounded  probability  under  D. 
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b)  .4  finds  a  solution  to  an  instance  oC  P  chosen  randomly  according  to  D  in  time  bounded 
by  a  polynomial  in  n  with  probability  approaching  1  as  n  gets  large.  Then  we  say  4 
efficiently  soiree  P  in  probability  under  D. 

c)  4  soiree  all  of  a  large  sample  of  instances  of  P  chosen  randomly  according  to  D  in 
average  time  that  is  bounded  by  a  polynomial  in  n  as  n  gets  large.  Then  we  say  that 
4  soiree  P  in  polynomial  avenge  time. 

Results  of  type  (a)  are  weaker  than  results  of  type  (b)  and  results  of  type  (b)  are 
weaker  than  results  of  type  (c).  It  is  often  the  case  that  we  can  prore  a  weaker  result  but 
not  a  stronger  one  for  a  particular  (4,  P)  pair  under  D.  Although  a  type  (c)  result  is  the 
strongest  type  of  result,  even  a  type  (b)  result  will  allow  us  to  conclude  that  4,  in  some 
practical  sense  (at  least  under  D ),  efficiently  solves  P.  A  result  of  type  (a)  cannot  always 
allow  us  to  draw  the  same  conclusion  since  k  may  be  very  small  (say  .01).  However,  many 
algorithms  that  we  consider  make  repeated  attempts  at  finding  a  solution  and,  even  if  k 
is  small,  there  is  a  good  chance  that  one  will  be  found  after  several  attempts  as  explained 
below. 

Many  algorithms  we  consider  proceed  by  assigning  values  to  variables  in  some  order 
which  is  decided  during  computation  and  assignments  are  never  undone  either  totally 
or  partially.  These  algorithms  either  continue  until  all  variables  are  assigned  values  (in 
which  case  a  solution  has  been  obtained)  or  they  stop  prematurely  because  they  discover 
that  every  set  of  assignments  of  values  to  unassigned  variables  cannot  possibly  lead  to 
a  solution  (in  which  case  it  cannot  be  determined  whether  or  not  a  solution  exists).  A 
property  of  these  algorithms  is  that  the  next  variable  to  be  assigned  a  value  is  chosen 
randomly  from  a  large  group  of  possibilities.  Thus,  repeated  runs  of  such  algorithms  will 
execute  differently  and  possibly  give  different  results.  If  the  probability  that  a  run  finds 
a  solution  is  bounded  from  below  by  a  constant  and  all  runs  execute  independently  then 
only  a  constant  number  of  runs  would  be  necessary  for  us  to  solve  a  random  instance  of  P 
with  probability  arbitrarily  close  to  1  (this  can  be  strengthened  to  a  type  (b)  result  if  the 
number  of  runs  is  allowed  to  grow  slightly  with  n).  Unfortunately,  it  is  not  the  case  that 
all  runs  execute  independently.  However,  for  the  algorithms  we  consider,  the  dependence 
is  very  weak  and,  according  to  the  results  of  our  experiments,  we  are  justified  in  supposing 
that  a  small  number  of  repeated  runs  of  4  will  allow  us  to  solve  P  with  probability  tending 
to  1.  Thus,  a  result  of  type  (a)  seems  to  translate  to  a  result  of  type  (b)  for  the  kinds 
of  algorithms  ’"p  consider.  When  referring  to  results  of  either  type  (a),  (b)  «»r  (c)  we  will 
sometimes  phrase  u; probabilistically  efficient ", 

Others  •  taken  this  approach  for  specific  NP-complete  problems.  Algorithms  which 
are  probabilistically  efficient  have  been  found  for  the  Hamiltonian  Circuit  problem  (lj  and 
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(23),  the  Planar  Traveling  Salesman  problem  (22),  the  Processor  Scheduling  problem  (9), 
the  Bin  Packing  problem  (21)  and  other  NP- complete  problems.  We  have  looked  for  similar 
results  on  algorithms  for  the  Satisfiability  problem. 

An  instance  of  the  Satisfiability  problem  is,  for  the  purposes  of  this  research,  a  Boolean 
expression  in  Conjunctive  Normal  Form  (CNF).  The  disjunctions,  also  called  clauses,  con* 
tain  a  subset  of  the  set  of  positive  and  negative  literals  obtained  from  a  set  V  of  Boolean 
variables.  An  assignment  of  truth  values  to  the  variables  of  V  is  called  a  truth  assignment 
to  V'.  If  v  €  V  i*  assigned  the  value  true  then  the  positive  literal  v  1  as  the  value  true 
and  the  negative  literal  9  has  the  value  fal*e.  The  two  unit  clauses  (v)  and  (if)  are  said 
to  be  complementary.  A  clause  is  satisfied  if  one  or  more  of  its  literals  has  value  true.  An 
instance  is  satisfiable  if  there  exists  a  truth  assignment  t  to  V  which  satisfies  all  clauses 
in  it.  Such  an  instance  is  said  to  be  satisfied  by  t.  The  Satisfiability  problem  is,  given  an 
instance  /,  find  a  truth  assignment  which  satisfies  J,  if  one  exists,  or  verify  that  no.  such 
truth  assignment  exists. 

In  order  to  understand  performance  over  a  range  of  instance  types  we  attempt  to  get 
results  for  a  family  of  distributions.  We  call  such  a  family  an  input  model  or  instance 
model. 

Our  main  results,  based  on  two  input  models,  show  the  existence  of  probabilistically 
efficient  algorithms  for  solving  random  instances  of  Satisfiability  which  are  satisfiable  with 
high  probability.  These  results  may  be  found  in  (5),  (6),  (7),  (II),  (12),  (13),  and  (17).  The 
best  algorithms  are  variants  of  the  Davis-Putnam  procedure  which  choose  elimination  vari¬ 
ables  successively  and  dynamically  from  clauses  containing  the  least  number  of  unassigned 
variables.  The  breakthrough  in  attaining  these  results  is  the  application  of  flow  analysis 
techniques  in  which  clauses  are  regarded  as  objects  which  flow  into  and  out  of  levels,  where 
level  i  represents  the  set  of  clauses  containing  exactly  t  unassigned  variables.  A  flow  of  less 
than  one  clause  per  iteration  into  the  bottom  level  (one  unassigned  variable  in  a  clause) 
can  be  handled  without  accumulation  in  the  bottom  level  by  choosing  to  assign  a  value 
which  satisfies  a  clause  at  the  bottom  level.  A  flow  greater  than  one  per  iteration  results 
in  an  accumulation  analogous  to  a  bathtub  overflowing  because  the  drain  is  loo  small. 
Heavy  accumulation  at  the  bottom  level  increases  the  probability  that  two  complementary 
clauses  exist  there.  In  such  a  case  a  satisfying  truth  assignment  cannot  be  found.  This 
mechanistic  way  to  look  at  the  operation  of  the  algorithm  has  provided  great  insight  into 
its  probabilistic  performance. 

On  the  other  hand,  it  is  sometimes  the  case  that  a  family  of  algorithms  almost  always 
requires  exponential  time  to  solve  random  instances  of  an  NP-complete  problem.  For 
example,  in  (8)  it  is  shown  that  a  powerful  formal  system  for  determining  the  stability 
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number  of  a  graph  require*  exponential  time  on  almoat  all  randomly  generated  graph*  with 
a  sufficiently  large  linear  number  of  edges.  We  have  obtained  similar  pessimistic  results  for 
Search  Rearrangement  Backtracking  (a  general,  powerful  family  of  search  procedures  which 
include*  the  Davis-Putnam  Procedure  (10])  on  the  Satisfiability  problem  when  random 
instances  are  nearly  always  unsatisfiable  (14).  Stated  another  way  this  result  says  that, 
regardless  of  what  heuristic  one  uses,  verification  that  a  given  instance  is  unsatisfiable  will 
take  exponential  time  on  the  average.  An  important  aspect  of  this  result  is  that  the  analysis 
explains  why  no  heuristic  can  possibly  be  probabilistically  efficient.  The  pessimistic  result 
is  due  to  two  properties  that  most  random  instances  have:  each  variable  appears  in  at 
most  a  small  fraction  of  the  clauses,  and  the  number  of  times  that  variables  are  linked  in 
large  enough  subsets  of  the  original  set  of  clauses  is  not  much  greater  than  the  number  of 
clauses  in  the  subset  (if  a  variable  v  is  in  five  clauses  of  a  particular  subset  of  clauses  then 
four  links  are  contributed  by  v  to  the  number  of  links).  In  fact,  any  instance  with  these 
two  properties  is  “hard”  in  the  sense  that  no  heuristic  (on  top  of  backtracking)  can  solve 
it  in  polynomial  time. 

Since  probabilistic  results  of  the  kind  stated  above  depend  on  input  distribution, 
some  justification  and  analysis  of  the  input  model  is  desirable.  Part  of  our  work  has  been 
to  investigate  properties  of  input  models  which  induce  probabilistic  efficiency.  We  have 
found  that  some  models  generate  a  preponderance  of  “trivial”  instances:  those  which  can 
be  solved  by  an  algorithm  that  would  be  considered  too  weak  to  be  used  in  practice.  For 
example,  some  models  allow  clauses  which  contain  no  literals  (null  clauses).  But  instances 
which  contain  a  null  clause  cannot  be  satisfied  by  any  truth  assignment  to  the  variables 
contained  in  them.  Thus,  if  a  null  clause  exists  in  a  random  instance  with  probability 
tending  to  1,  then  random  instances  are  efficiently  solved  in  probability  simply  by  searching 
the  input  for  a  null  clause.  Clearly,  this  algorithm  would  be  useless  in  practice. 

However,  some  favorable  results,  on  backtracking  variants,  which  appear  in  the  liter¬ 
ature  (e.g.  (25],  and  (26()  depend  on  the  high  frequency  of  null  clauses  generated  by  the 
input  model,  although  this  fact  is  hidden  in  the  analysis.  We  have  shown  ((12],  and  (13]) 
that  even  exhaustive  search,  after  checking  the  input  for  a  null  clause  and  finding  none,  is 
average-case  superior  to  the  algorithms  analyzed  in  the  citations  above  because  it  usually 
clops  early  due  to  a  null  clause.  More  remarkably,  exhaustive  search  was  shown  by  us  ( 1 3] 
to  run  in  polynomial  average  lime  under  the  same  conditions  that  the  relatively  sophis¬ 
ticated  pure-literal-rule  algorithm  was  shown  to  require  superpolynomiol  average  lime  (3) 
(because  this  algorithm  failed  to  check  for  null  clauses  first).  Thus,  our  investigations  into 
input  model  properties  have  provided  great  insights  into  the  nature  of  previous  results  and 
the  utility  of  past  and  future  results  in  this  area. 

The  remainder  of  this  report  details  the  results  we  have  attained  under  the  grant. 
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in  addition  to  the  work  on  Satisfiability  we  have  collaborated  with  other  retearchen  on 
related  problems  in  Quadtree  representations  (parallel  architectures),  hashing  with  iasy 
deletions  (data  base  theory),  and  VLSI  testing  and  verification.  Sections  2  and  3  describe 
our  results  on  algorithms  for  Satisfiability.  Sections  4  to  6  describe  results  in  the  other 
areas.  Sections  7,8,  and  9  list  publications  under  the  grant,  recent  invited  talks,  and  recent 
professional  service. 


3.  Ov«rvi«w  of  Result*  for  th«  Satisfiability  Problem 


The  Satisfiability  problem  (SAT)  U  the  first  problem  found  to  be  NP-complete.  Any 
problem  in  NP  can  easily  be  transformed  to  SAT  and  transformations  from  SAT  to  other 
NP-complete  problems  are  often  straightforward.  SAT  is,  therefore,  one  of  the  more  im¬ 
portant  NP-complete  problems. 

SAT  is  also  an  important  problem  because  it  turns  up  in  a  number  of  practical  areas. 
For  example,  a  collection  of  propositions  P  and  a  hypothesis  H  can  be  transformed  to  an 
instiuice  /j  of  SAT  such  that  I\  is  not  satisfiable  if  and  only  if  H  follows  logically  from  P. 
Thus,  SAT  is  the  basis  for  a  number  of  theorem  proven  and  is  of  interest  to  the  Artificial 
Intelligence  community.  SAT  also  appears  in  automatic  hardware  testing  and  design  as  - 
the  following  examples  show: 

1.  A  combinational  circuit  C  computes  one  Boolean  function  of  its  inputs  for  each  of  its 
outputs.  Its  functionality  can,  therefore,  be  described  by  a  set  of  Boolean  form"'*e 
{X?i one  for  each  output.  A  test  sequence  5  for  C  must  «et 
input  values  to  exercise  both  logic  levels  of  each  output  of  C.  Finding  a  set  of  input 
values  which  forces  the  itk  output  to  level  1  (0)  is  equivalent  to  finding  a  truth  assign¬ 
ment  to  the  variables  of  BX[C)  which  satisfies  Bi(C)  (->R,(C)).  Thus,  the  problem 
of  generating  (designing)  a  test  sequence  for  a  combinational  circuit  that  checks  for 
stuck-at  faults  can  be  stated  as  an  instance  of  SAT  (actually  a  set  of  instances  of 
SAT). 

2.  The  VLSI  design  process  typically  proceeds  through  many  levels  of  abstraction  from 
the  functional  level  through  the  gate  level  down  to  the  layout  level.  Functional  equiv¬ 
alence  must  be  maintained  after  each  translation  from  one  level  to  the  next  level  down 
or  else  the  end  product  may  not  perform  as  expected.  If  the  circuit  is  combinational, 
functional  equivalence  may  be  regarded  as  Boolean  equivalence  between  two  levels  of 
abstraction.  Without  going  into  the  details  of  how  one  can  obtain  a  Boolean  formula, 
describing  the  functionality  of  a  particular  level  of  abstraction,  one  way  to  test  for 
Boolean  equivalence  at  different  levels  is  to  determine  whether  the  Boolean  formula 
formed  from  the  exclusive-or  of  the  formulas  at  both  levels  is  a  tautology-  if  ii  is 
then  the  circuit  descriptions  nt  both  levels  are  identical,  otherwise  they  are  not.  But 
the  problem  of  determining  whether  B  is  a  tautology  is  *>quivalenl  to  the  problem  of 
determining  whether  ->B  has  no  solutions  (this  is  an  instance  of  SAT).  Thus,  SAT  is 
important  to  functional  verification  (also  known  as  logic  verification)  between  differ¬ 
ent  levels  of  abstraction  in  the  design  of  VLSI  combinational  circuits.  The  problem  of 
testing  for  functional  equivalence  between  different  levels  of  abstraction  in  the  design 
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of  sequential  circuits  tnay  be  reduced  to  a  combinational  circuit  problem  by  means  of 

the  level  sensitive  scan  design  or  equivalent  approaches.  Thus,  SAT  is  important  to 

VLSI  design  over  a  wide  variety  of  circuit  types. 

Unfortunately,  SAT  is  an  NP- complete  problem.  Therefore,  there  is  no  known  efficient 
algorithm  for  SAT  (an  algorithm  that  has  running  time  bounded  by  a  polynomial  on  the 
length  of  the  given  instance  is  considered  efficient).  Thus,  the  best  that  can  be  hoped  for  is 
an  algorithm,  based  on  some  heuristic,  which  requires  polynomial  time  on  most  instances. 
If  the  probability  that  an  algorithm  A  for  SAT  runs  in  polynomial  time  tends  to  J.  as 
instance  sise  increases  then  A  may  be  regarded  as  an  efficient  algorithm  for  SAT  in  a 
practical  (or  at  least  probabilistic)  srnse.  We  have  been  looking  at  the  question  of  whether 
probabilistically  efficient  algorithms  exist  for  SAT  and  other  NP- complete  problem*. 

In  order  to  answer  the  question  of  probabilistically  efficient  algorithms  h e  SAT  we 
must  impose  some  distribution  on  instances.  This  presents  two  problems  rejt  trding  the 
robustness  of  probabilistic  results.  First,  a  result  obtained  under  one  dh&'ihution  doe* 
not  necessarily  hold  under  another  (or.  more  appropriately,  analysed  behavior  assuming 
one  distribution  may  be  dramatically  different  from  empirical  behavior  ou  a  naturally 
occurring  set  of  instances).  Second,  in  order  to  produce  an  analysis  at  all  it  is  practically 
a  requirement  that  all  “components"  of  a  given  instance  be  independent.  In  our  work 
instances  are  CNF  Boolean  expressions  and  “components"  are  clauses.  We  use  two  models 
for  constructing  random  instances.  In  both  models  a  random  instance  of  SAT  consists  of  n 
clauses,  each  containing  literals  from  r  variables.  In  model  A/j  each  clause  contains  exactly 
k  literals  and  is  chosen  uniformly  from  the  set  of  all  possible  k  literal  clauses.  In  model 
M%  each  clause  contains  each  literal  with  probability  p  (so  clauses  may  have  any  number 
of  literal*  up  to  2r).  In  order  to  reduce  the  effect  of  the  two  problems  mentioned  above  we 
allow?  and  p  to  be  functions  of  n;  this  allows  us  to  adjust  the  properties  of  random  instances 
to  closely  match  the  properties  of  many  natural  sets  of  instances.  For  example,  consider 
how  we  might  set  the  parameters  otmodel  AJfat  to instances  with  properties  that 
match  instances  of  the  test  design  problem  of  itent'CT)  above  (  wo  omit  a  similar discussion  of 
Mi).  Suppose  p(n)  «  ctfo(n)fr{n),  A  <  or  <  1  (this  restriction  is  not  really  necessary  but 
it  g*ves  a  useful  example).  Then  random  instances  $,te  usundy  aatisfiaWe  if,  foe  any  e  >  , 
limn<r_oo  nl~a/vl~'  <  oo,  and  instances  are  usually  un*ftUsffnhb*  if  I  .si,.,r  v}  ~w/r  =  oo 
(the  higher  or  lower  rate  of  growth  » *”"*/?•.  the  more  un^'tsshabh*  *•(  sntislinble, 
respectively,  random  instances  are).  Since  a\\  interesting  cmrilwietbnal circuits  correspond 
to  Boolean  formulas are  satisftable,  inputs  to 'the.  test  design  problem  of  item  (l) 
above  have  the  property  that  they  are  satisffable  Hence,  we  should  make  the  function 
/(n)  as  nI“°/r(n}  tend  capHly  tc  ae-co  to  generate  random  instances  that  Closely  match 
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v  ^  .fry 


tbU  properly.  U  turns  oul  that  it  probably  doe*  not  matter  exactly  how  fast  /(n)  tends 
to  sero  because  one  of  our  main  results  is  that  SAT  is  a  “probabilistically  easy"  problem  if 
/(n)  tends  to  sero,  regardless  of  how  fast  /(n)  tends  to  sero  (we  caution  the  reader  not  to 
make  too  much  of  the  relationship  between  A/j  and  instances  of  test  design  at  this  time; 
the  proceeding  example  merely  illustrates  a  general  connection  between  the  two). 

Our  main  results,  in  general  terms,  are  as  follows: 

a.  Almost  all  satisfiable  instances  of  SAT  generated  under  A/a  can  be  solved  in  0(nln(n)) 
time. 

b.  Almost  all  ''interesting”  unsatisfiable  instances  of  SAT  generated  under  A/j  require 
exponential  time  to  solve  (that  is,  verify  unsatisfiability)  by  Backtracking  using  any 
heuristic  for  variable  elimination. 

Tho  results  are  stated  precisely  in  Section  3.  We  should  like  to  point  out  that  "Backtracking 
0;*i«g  any  heuristic"  represents  a  wide  class  of  algorithms  so  the  result  of  item  (b)  is 
f  iiriy  strong.  The  results  suggest  that  SAT  is  "probabilistically  easy"  if  most  inputs  are 
satisfiable  and  is  "probabilistically  hard*’  if  most  inputs  are  unsalisfiable.  Translating 
this  to  the  Design.  Automation  problems  raised  earlier,  the  results  suggest  that  the  test 
design  problem  is  "probabilistically  easy"  but  the  functional  or  logic  verification  problem 
is  "probabilistically  hard".  Actually,  this  phenomenon  has  been  observed  in  the  Design 
Automation  community  for  some  time.  However,  the  meaning  of  "easy"  and  "bird"  has 
not  been  understood.  Experiments  have  shown  that  some  algorithms  are  faster  than  others 
for  certain  classes  of  inputs  of  a  certain  sice  (for  example,  see  (20],  (24)  and  [29])  but  these 
experiments  do  not  seem  to  say  how  fast,  in  an  absolute  sense,  over  many  classes  of  inputs 
(even  unforseen),  and  many  different  sixes.  Also,  these  experiments  usually  do  not  give 
insight  as  to  why  the  proposed  algorithms  are  efficient  or  inefficient,  a  most  always,  aside 
from  the  plausible  arguments  that  led  the  authors  to  choose  the  heur  « tier  that  drive  the 
algorithms.  On  the  other  hand,  our  results  do  say  something  abo*  ,  efficiency  on  general 
classes  of  inputs  of  all  sixes  and  our  analytic  methods  say  why.  However,  the  inputs  we 
can  make  claims  about  are  CNF  Boolean  formulas  whereas  the  inputs  in,  say,  the  VLSI 
world  are  formulas  based  on  multi-level  logic.  If  our  results  are  to  apply  to  real  world 
problems  we  must  either  redo  our  analysis  using  other  models  or  show  that  converting  to 
CNF  Boolean  formulas  haa  little  effect  on  probabilistic  results  Neither  task  is  easy  and 
investigation  of  both  tasks  are  among  our  long  range  goals. 

Work  relating  SAT  to  Design  Automation  problems  is  being  conducted  jointly  with 
Kurt  Keutxer  of  Bell  Telephone  Laboratories,  Murray  Hill,  New  Jersey.  Our  results  to 
date  are  given  in  [15]  and  mentioned  in  Section  4. 
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3.  Results  on  Algorithms  for  SAT 


Let  7  be  a  CNF  Boolean  formula  ami  let  7/(7)  be  a  heuristic  function  that  outputs  a 
variable  contained  in  /.  Let  a  clause  be  regarded  as  a  set  of  literals.  Call  a  clause  that 
contains  one  literal  a  unit  clause.  We  have  analysed  a  wide  class  of  algorithms  each  of 
which  returns  “SAT"  if  the  input  instance  is  satisfiable  and  “UNSAT"  if  the  input  instance 
is  not  satisfiable.  We  call  this  class  SRB  (for  Search  Rearrangement  Backtracking)  and 
express  it  as  follows: 

SRB(7): 

If  7  has  a  null  clause  then  return  “UNSAT" 

Else  if  7  is  empty  then  return  "SAT" 

Else 

v  77(7) 

h  c€/,vgc} 

If  SRB(7,)=“UNSAT"  and  SRB(/a)="UNSAT"  then  return  “UNSAT" 

Else  return  “SAT" 

In  SRB,  /j  is  the  subinstance  of  SAT  obtained  from  7  by  assigning  the  value  true  to  variable 
v  and  /a  is  the  subinstance  obtained  by  assigning  the  value  false  to  v. 

We  have  also  investigated  the  following  two  algorithms  (which  do  not  backtrack): 

.4.(7) : 

Construct  a  random  truth  assignment  t  to  the  variables  of  7 

Check  whether  t  satisfies  7 

If  l  satisfies  7  then  return  “SAT" 

Else  return  “GIVE  UP" 


M(iy- 

While  7  ^  ^  and  Vc  €  7,  c  £  <f> 

If  there  is  a  unit  clause  {u}  6  7  then  v  *—  u 
Else  choose  a  literal  v  randomly  from  L 
7 «—  {c  —  (compfv)}  :  c  €  7  and  v  £  c} 

L  <—  L  —  {v,  c omp(v)} 

If  7  =  ^  then  return  “SAT" 

Else  return  "GIVE  UP" 
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The  algorithms  above  can  easily  be  modified  to  return  solutions  instead  of  *'5AT". 
The  resulting  modifications  do  not  significantly  affect  the  efficiency  of  these  algorithms. 
Wr.  chose  the  above  forms  because  they  are  easier  to  analyse. 

Here  are  the  results. 

Theorem  1:  ((11]) 

Suppose  instances  of  SAT  are  generated  according  to  model  A/j.  Let  p  and  r  be 
functions  of  n. 

a.  I  f  1 1  m  n  ln(n)/pr  <  1  then  the  probability  that  a  random  instance  has  a  solution 

tends  to  1  as  n  tends  to  infinity. 

b.  If  limn_ooln(n)/pr  =  c,  1  <  c  <  2.5,  and  limn— «,  nl~c~X /r  =  oo  then  the 
probability  that  a  random  instance  has  a  solution  tends  to  0  as  n  tends  to  infinity. 

c.  If  limn~.oo  ln(n)/pr  >  2.5  then  the  probability  that  a  random  instance  has  a 

solution  tends  to  0  as  n  tends  to  infinity. 

* 

Theorem  2:  ((H)) 

Suppose  instances  of  SAT  are  generated  according  to  model  A/j.  Let  p  and  r  be 
functions  of  n. 

a.  If  limn— oohi(n)/pr  <  1  then  the  probability  that  algorithm  finds  a  solution 
to  a  random  instance  tends  to  1  as  n  tends  to  infinity. 

b.  If  lim„-.oo  ln(n)/pr  =  c,  1  <  c  <  2.5,  and  limn— 0OnI”c/r1“<  <  oo,  for  any 
e  >  0,  then  the  probability  that  algorithm  Aj  finds  a  solution  to  a  random 
instance  tends  to  1  as  n  tends  to  infinity. 

It  can  be  shown  that,  with  probability  tending  to  1,  all  variables  appear  in  fewer  than 
C?(ln(n))  clauses  of  a  random  instance  of  SAT.  Thus,  algorithm  A2  almost  always  runs  in 
0(nln(n))  time.  From  this  and  Theorems  1  and  2  we  can  assert  that,  under  M2,  almost 
all  catisfiable  instances  of  SAT  can  be  solved  in  0(nln(n))  time. 

Theorem  3: 

Suppose  instances  of  SAT  are  generated  according  to  model  il/j.  Let  r  be  a  function 
of  n. 

a.  If  n/r  >  —  l/lg(l  —  2~*)  then  a  random  instance  of  SAT  is  unsatishable  with 
probability  tending  to  1. 


10 


b.  If  n/r  <  -l/lg(l-2“*)  then  the  average  number  of  •elation*  per  random  instance 
i«  exponential  in  n. 

Theorem  3  is  analogous  to  Theorem  1  (for  model  A/s).  The  point  at  which  instances 
change  from  being  mostly  unsatisfiable  to  having  a  large  average  number  of  satisfying 
assignments  is  given  by  n/r  =  -l/lg(l  -  2~k)  and  is  called  the  flip  point.  We  use  Al\  in 
place  of  A/s  when  considering  the  problem  of  verifying  that  unsatisfiable  instances  have 
no  solution  because  model  il/2  generates  too  many  instances  with  null  clauses.  If  a  null 
clause  appears  in  an  instance  then  that  instance  is  trivially  unsatisfiable.  Model  A/ 1  does 
not  allow  trivial  instances  of  this  kind. 

Theorem  4:  ([14]) 

Suppose  instances  of  SAT  are  generated  according  to  model  A/j.  Let  r  be  a  function 

of  n.  Then,  for  all  functions  //,  SRB  requires  superpolynomial  time  with  probability 

tending  to  1  if  liinn-QQ  n/r  =  o(n1/,"l"tn))  and  n/r  >  -l/lg(l  -  2“k). 

Theorems  3  and  4  say  that  even  the  most  clever  heuristic  function  imaginable  cannot 
give  us  a  probabilistically  efficient  Backtrack-based  algorithm  for  verifying  unsatisfiability 
if  n/r  =s  o(ni/i*  >«(»))  and  n/r  >  -l/lg(l  -  2“*). 

We  also  have  some  average  case  results  based  on  model  A/a.  Such  results  give  per¬ 
spective  to  several  average-case  papers  (e.g.  [3],  [25],  and  [26])  by  showing  the  dependence 
of  the  favorable  results  on  the  presence  of  null  clauses  in  rantiom  instances. 

The  algorithms  below  depend  on  the  following  definitions.  Let  a  variable  which  ap¬ 
pears  exactly  once  in  an  instance  /  be  called  a  unit  variable.  Let  a  variable  which  appears 
exactly  twice  in  /  be  called  a  double  variable.  Let  a  variables  which  appears  at  least  two 
times  in  I  be  called  a  weak-jerious  variables.  Let  a  variable  which  appears  at  least  three 
times  in  I  be  called  a  serious  variable.  The  table  below  defines  substitutions  for  clauses  in 
I  containing  unit  and  double  variables.  In  the  table  ve  use  v  to  denote  a  positive  literal 
taken  from  a  unit  or  double  variable,  a  negative  literal  so  taken,  and  z  and  y  either  a 
positive  or  negative  literal  which  is  not  necessarily  taken  from  a  unit  or  double  variable. 
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var  type 

substitution  name 

gccunsflce 

replacement 

unit 

unit  elimination 

(w,  *,...) 

(rue 

unit 

unit  elimination 

(c. 

(rue 

double 

double  elimination 

(v,v, 

true 

double 

double  elimination 

(0,  C,  z, ...) 

true 

double 

trivial  elimination 

•••) 

true 

double 

pure  literal  rule 

(v,x,...),(v,y,...) 

true 

double 

pure  literal  rule 

(«.*.-)»  Ob  V.  •••) 

true 

double 

resolution 

(v,*,...),(tJ,y,...) 

(*»  •••»  V»  •••) 

When  we  say  apply  unit  elimination  we  mean,  according  to  the  table  above,  look  for  a 
clause  containing  a  unit  variable  v  and  replace  it  with  the  logical  value  true;  if  no  such 
clause  exists  do  nothing.  Similar  statements  hold  for  applying  any  of  the  other  substitution 
rules  listed  in  the  table.  It  is  possible  that,  after  repeated  applications  of  double-variable 
substitution  rules,  some  double  variables  will  occur  only  once  in  /.  By  clean  up  double 
variables  we  mean  eliminate  all  clauses  containing  double  variables  that  appear  once  in  /. 

*Ve  consider  the  following  two  algorithms: 

NULL[I) : 

If  /  has  a  null  clause  then  return  “unsatisfiable” 

Otherwise, 

Repeatedly  apply  unit  elimination  Until  opportunities  vanish 

For  all  truth  assignments  (  to  weak-scricus  variables  in  /, 
if  t  satisfies  I  then  return  “satisfiable" 

Return  “unsatisfiable" 


INFREQ(l) : 

If  /  has  a  null  clause  then  return  “unsatisfiable" 

Otherwise, 

Repeatedly  apply  double  variable  substitution  rules  in  order 
Until  opportunities  vanish 
Clean  up  all  remaining  double  variables 
Repeatedly  apply  unit  elimination  Until  opportunities  vanish 
For  all  truth  assignments  t  to  serious  variables  in  I, 
if  (  satisfies  /  then  return  “satisfiable” 

Return  “unsatisfiable" 
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Theorem  5:  ([13]) 


NULL  run*  in  polynomial  average  time  if 

a.  n  =  r*,  .5  >  t  >  0,  and  pr  <  r*”‘. 

b.  n  =  r*,  1  >  €  >  .5,  and  pr  <  (1  -  e)ln(n)/(2«). 

c.  n  =  0r ,  0  >  0,  and  2.64(1  -  (1  -  p)^r(l  +  2 0pr))  <  0t~7rr. 

d.  n  =  r1,7>  1,  and  pr  <  (7  -  l)ln(n)/(27). 

The  result  of  Theorem  5b  is  due  strictly  to  null  clauses  in  random  instances  yet  no  other 
analysis  shows  a  similar  result  under  the  same  conditions  and  the  result  of  [3]  actually 
shows  that  a  relatively  sophisticated  algorithm  based  on  the  pure  literal  rule  requires 
superpolynomial  average  time  under  those  conditions.  The  results  of  Theorem  5a,  and  5d 
match  previous  results  in  [25],  and  [26]. 

Theorem  6:  ([12]) 

IN  FREQ  runs  in  polynomial  average  lime  if 

a.  rt  ss  r*,  .66  >  e  >  0,  and  pr  <  re€”*. 

b.  n  =  r(,l>«>  .66,  and  pr  <  (1  -  e)ln(n). 

Doth  Theorem  6a,  and  6b  are  improvements  over  the  best  known  previous  results  under 
the  conditions  stated.  Theorem  6b  is  due  to  the  presence  of  null  clauses  but  Theorem  6a 
is  due  to  pre-processing  the  input  by  eliminating  from  it  infrequently  occurring  variables 
(those  which  are  unit  or  double  variables). 

Other  favorable  results  have  been  obtained  under  model  Mi  when  instances  are  nearly 
always  satisfiable.  From  Theorem  3b  this  is  roughly  when  n/r  <  — l/lg(l  -2”*).  In  these 
studies  k  is  assumed  to  be  independent  of  n  and  r.  Thus  it  appears  that  the  case  where 
linin.r— 00  n/r  =  cr,  where  a  is  any  constant  greater  than  zero,  is  particularly  important 
when  considering  model  M\. 

A  number  of  algorithms  have  been  analyzed  under  AI\ ,  In  (fi|  we  showed  Ilia,'*. 


Theorem  7: 


Aj  efficiently  solves  SAT  iu  bounded  probability  under  il/j  when 


lim  n.T  < 

n.r-^oo 


2k-l 

"T“ 
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Notice  that  the  expression  on  the  right  side  of  the  inequality  is  -0(l/Jk)/ln(l  -  2“*)  if  k 
ii  large. 

Theorem  7  is  significant  for  two  reasons.  First,  we  cannot  make  the  claim  that  Aj 
almost  always  finds  a  solution  to  a  random  instance  of  SAT  when  one  exists,  as  we  could 
in  the  case  of  the  random  clause  model,  since  there  is  a  large  gap  between  the  point  where 
n/r  =  -0(l)/ln(l  -  2“*)  and  the  point  where  .4*  begins  to  work  well  probabilistically 
(n/r  =  -0(l/Jk)/ln(l  -  2”*))  due  to  the  1/Jfe  factor  which  appears  in  the  latter  term. 
Second,  for  that  range  of  n/r  over  which  .4j  is  probabilistically  efficient,  it  is  only  able 
to  find  solutions  efficiently  with  bounded  probability  whereas  .4j  finds  solutions  efficiently 
in  probability  under  A/j.  Thus,  we  see  that,  in  some  sense,  model  M\  generates  harder 
instances  than  A/;  (at  least  as  far  as  .4j  is  concerned)  and  the  results  based  on  the  latter 
model  do  not  map  precisely  to  the  same  kind  of  results  based  on  the  former. 

We  also  studied  the  following  generalisation  of  .4}: 

*(f) : 

Repeat 

Let  c  be  a  smallest  clause  in  / 

Choose  u  randomly  from  c 

Remove  from  /  all  clauses  containing  u 

Remove  from  I  all  occurrences  of  comp(u) 

Until  I  is  empty  or  there  exist  two  complementary  unit  clauses  in  I 
If  /  is  empty  Then  return  ("satisfiable") 

Otherwise  return  ("give  up") 


In  [6j  we  showed 


Theorem  8: 


*4j  efficiently  solves  SAT  in  bounded  probability  under  A/j  when 

1.54  ♦2*"1  /L 


lim  n/r  <  . 

rt,r— *o©  k  + 


i*-1  fit-iy 

I  \k  —  2/ 


for  4  <  A:  <  40 


Theorem  9: 

.4j  efficiently  solves  SAT  in 
lim  n/r  < 

n,r— *oo 


probability  under  Jl/j  when 

0.92*2 k~'  fk-l\k~2  r  ,  ^  _ 
- k - (tTjj  for  4  <  *  <  4°. 
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These  results  are  significant  for  three  reasons.  First,  .4)  efficiently  solves  SAT  in  probability 
(almost  always)  over  about  the  same  range  of  n/r  that  ,4j  efficiently  solves  SAT  in  bounder! 
probability.  Second,  the  range  of  n/r  over  which  .4)  is  probabilistically  efficient  is  only 
slightly  greater  than  the  range  of  n/r  over  which  .4*  is  probabilistically  efficient.  Thus, 
although  .4)  performs  better  than  .42  probabilistically,  there  is  still  a  wide  gap  between  the 
(lip  point  and  the  point  at  which  .42  begins  to  perform  well.  Third,  and  most  important, 
.4*  end  .42  are  vastly  superior  in  probabilistic  performance  compared  to  algorithms  that 
rely  on  certain  greedy  heuristics  to  select  the  next  variable  to  assign  a  value  to.  An  example 
of  a  greedy  heuristic  is  "select  the  variable  v  for  which  the  difference  between  the  number 
of  occurrences  of  the  literal  v  and  the  literal  v1  in  /  is  greatest  and  assign  variable  v  the 
value  which  satisfies  most  clauses".  However,  as  we  will  see  below,  greedy  heuristics  added 
to  ,4 2  and  .4j  improve  the  performance  of  those  algorithms  significantly,  especially  for  the 
case  k  -  3. 

In  the  case  k  =  3  (CNF  expressions  with  three  literals  per  clause  are  instances  of 
the  3-Satisfiability  problem  which  is  also  NP-complete)  we  have  found  that  the  maximum 
occurring  literal  selection  heuristic  (if  there  are  no  single-literal  clauses  in  /,  select  a  variable 
randomly  and  assign  it  the  value  which  srflifies  most  clauses)  used  with  A?  efficiently  solves 
SAT  in  bounded  probability  under  M\  when  limn,r-.oon/r  <  2.9.  In  the  cose  k  =  3,  Aa 
efficiently  solves  SAT  in  bounded  probability  when  lim„|r—.oo  n/r  <  2.66  (7).  This  may  be 
compared  with  the  flip  point  (n/r  =  4). 

From  our  analysis  in  (6]  and  (7)  we  have  devised  the  following  algorithm  for  SAT: 

MI) : 

Repeal 

If  there  is  a  single-literal  clause  {(}  in  I  Then  u»-l 
Otherwise  u  ♦-  f*  such  that  /*  G  L  and  for  all  /  €  l>  tu(f*)  >  1 u(l) 

Remove  from  I  all  clauses  containing  u 
Remove  from  I  all  occurrences  of  comp(u) 

L  ♦—  L  -  (u,comp(u)} 

Until  /  is  empty  or  there  exist  two  complementary  Unit  Clauses  in  I 
If  /  is  empty  Then  return  ("satisfiable") 

Otherwise  return  ("give  up") 


where  w(l),  the  weight  of  literal  l,  is  determined  a s  follows: 

Let  c  be  a  clause  in  I  and  let  jt;(c)  be  a  weighting  function  mapping  clauses  to  integers. 
Let  us  say  that  /i/(c)  is  the  weight  of  clause  c  at  the  end  of  the  jtk  iteration  of  /U(/). 
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Initially  po(c)  =  1  for  every  clause  c  €  /.  The  clause  weighting  function  is  updated 
as  follows:  if  /  is  the  literal  chosen  on  the  jtk  iteration,  1  Vj(l)  is  the  total  weight  of 
clauses  containing  l  at  the  start  of  the  jiK  iteration  (these  clauses  will  be  removed)  and 
Nj[l)  is  the  number  of  clauses  containing  comp{l)  at  the  start  of  the  jtK  iteration  (one 
literal  will  be  removed  from  each  of  these  clauses)  then  fij(r)  =  )  +  Hj(f)/W;(/) 

if  e  contains  cotnp(l),  n}(c)  =s  0  if  e  contains  l  and  pt,(e)  =  p,_j(c)  otherwise.  The 
literal  weighting  function  is 

«K0  -  Mi(0W) 

According  to  our  experiments,  .‘U  solves  SAT  efficiently  in  bounded  probability  under 
model  jWi  when  lim*, r_.oo  n/r  <  d. 

The  significance  of  this  result  is  that  ,‘U  appears  to  efficiently  solve  almost  all  instances 
cf  3-SiiT.  We  hope  to  prove  this  result  analytically  and  devise  an  extension  to  A*  which 
will  provide  similar  performance  for  any  fixed  value  of  k. 
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4.  Recent  Results  in  Design  Automation 


The  problem  of  designing  a  test  sequence  for  stuck-at  faults  in  combinational  circuits  was 
stated  in  Section  2.  This  problem  can  be  reduced  to  determining  a  truth  assignment  which 
satisfies  a  Boolean  expression.  According  to  our  results,  this  makes  stuck-at  testing  easy, 
in  the  probabilistic  sense,  if  the  Boolean  expressions  are  in  Conjunctive  Normal  Form. 
Generally,  however,  they  are  not;  in  fact  the  Boolean  expressions  we  wish  to  solve  are 
usually  multi-level.  In  the  case  of  PLA«,  the  Boolean  expressions  are  two-level  and  we 
have  attacked  them  first. 

The  Boolean  expressions  associated  with  PLAs  are  irredundant  and  in  Disjunctive 
Normal  Form  (DNF).  That  is,  no  conjunction  is  subsumed  by  the  remainder  of  the  DNF 
expression.  We  examined  an  input  model  Mi  which  is  the  same  as  M7  with  the  connectives 
mi  and  or  reversed  and  found  that  instances  generated  by  M3  are  irredundant  with 
probability  lending  to  1.  Thus,  we  have  used  M3  as  a  model  for  the  Boolean  expressions 
of  PbAs. 

The  internal  nodes  that  must  be  checked  for  stuck-at  faults  are  all  at  the  second  level. 
To  bring  an  intermediate  node  logic  level  out  to  a  primary  output,  the  logic  level  of  all 
other  intermediate  nodes  must  be  low.  Finding  a  test  vector  to  do  this  is  equivalent  In 
finding  a  truth  assignment  which  satisfies  a  Boolean  expression  that  is  the  complement  nf 
the  original  expression  minus  the  conjunction  associated  with  the  intermediate  node  under 
test.  We  have  shown  that  finding  such  a  truth  assignment  is  easy  in  a  probabilistic  sense 
(15).  The  next  step  is  to  extend  the  results  to  deeper-level  logics. 
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S.  Parallel  Algorithms  and  Quadtree  Representations 


In  addition  to  our  probabilistic  results  on  algorithms  for  NP-complete  problems,  we  have 
worked  with  other  researchers  to  obtain  average  case  results  on  operations  for  a  class  of 
parallel  algorithms  and  on  hashing  with  lazy  deletions.  The  results  on  parallel  computa- 
tions  are  presented  here  and  the  results  on  hashing  with  lazy  deletions  are  presented  in  the 
next  section.  The  results  on  parallel  computations  are  based  on  a  data  structure  called  a 
Quadtree. 

A  Quadtree  is  a  natural  and  well  known  data  structure  for  the  parallel  solution  of 
certain  numerical  problems  by  means  of  recursive  decomposition.  Quadtrees  are  described 
in  [30j  and  (31).  A  feature  of  Quadtrees  that  makes  them  interesting  is  their  ability  to 
support  recursive  processes  which  have  no  need  to  communicate  with  any  processes  other 
than  the  parent  process.  Unfortunately,  there  is  some  overhead  penally  that  must  be 
paid  in  terms  of  space  and  access  time  in  order  to  make  use  of  Quadtrees.  Specifically, 
the  two  overhead  factors  we  arc  concerned  with  are  (1)  the  spree  required  to  represent  a 
matrix  in  Quadtree  format,  and  (2)  the  lime  required  to  access  an  element  of  a  matrix  in 
Quadtree  format.  The  overhead  can  be  kept  low  only  when  matrices  are  sparse  (that  is, 
the  percentage  of  zero  elements  is  close  to  100  percent).  We  have  chosen  to  investigate 
overhead  requirements  for  algorithms  involving  permutation  matrices  such  as  the  Fast 
Fourier  Transform  (FFT)  since  an  n  x  n  permutation  matrix  has  exactly  n  non-zero  entries 
and  is,  therefore,  sparse  in  the  traditional  sense  when  n  is  large. 

We  have  found  that  the  average  space  and  time  requirements  to  maintain  a  Quadtree 
for  random  permutation  matrices  are  small.  Let  n  be  any  power  of  2.  Number  all  n  x  ri 
permutation  matrices  arbitrarily  but  uniquely.  Let  S;(n)  and  2s, (n)  denote  the  space  and 
average  access  time,  respectively,  required  for  permutation  matrix  *.  Let  S{n)  and  T(n) 
denote  the  average  space  and  lime  required  over  all  n  x  n  permutation  matrices  (we  assume 
that  permutation  matrices  are  uniformly  distributed).  Our  results  are  ns  follows: 

Theorem  10:  ((32)) 

For  any  n  x  n  permutation  matrix  i, 


5,(n) 


,  n  lg(n)  *ln 

~  2  3 


Un) 


~  2 


3u* 


We  also  showed  that 
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Theorem  11:  ((32j) 


SW=^k  +  £-!±f,  *"d 

fW=&)  +  U> 

Furthermore,  we  showed  that  overhead  requirements  for  the  FFT  permutation  match  the 
upper  bounds  of  Theorem  10.  Theorems  10  and  11  say  that  Quadtree  maintenance  over* 
head  results  in  a  modest  slowdown  factor  of  lg(n)/2  in  both  space  and  time.  These  costs 
may  be  easily  recoverable  due  to  the  facility  for  process  decomposition  and  scheduling  that 
is  unavailable  with  other  representations. 

This  work  was  done  jointly  with  David  S.  Wise  using  funds  supplied  in  part  by  the 
National  Science  Foundation  under  grant  number  DOR  84*05241. 
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6.  Average  Case  Analysis  of  Hashing  with  Laty  Deletions 

A  hash  table  is  a  collection  of  nodes,  some  of  which  are  occupied  by  cells  containing 
useful  data  and  the  rest  are  unoccupied.  For  convenience,  we  assume  n  nodes  numbered 
arbitrarily  from  1  to  n.  Occasionally  cells  are  “accessed”  for  the  data  they  contain,  or  new 
cells  are  “inserted"  into  the  table,  or  existing  cells  are  “deleted"  from  the  table.  Let  each 
insertion,  deletion,  and  access  be  called  an  ida  epoch.  Suppose  a  cell  e  is  created  at  i«/u 
epoch  1 ,  and  deleted  at  ida  epoch  t/.  The  interval  (f*,t/)  is  said  to  be  the  lifespan  of  e.  At 
c  becomes  the  end  of  a  chain  of  cells  (occupied  nodes)  determined  by  the  hash  function 
employed  and  the  current  state  of  the  hash  table.  Let  chain(c)  denote  the  chain  associated 
with  cell  c.  During  its  lifespan,  c  may  be  accessed  0,  1,  2  or  more  times.  The  number 
of  times  that  c  is  accessed  during  its  lifespan  is  called  the  accesspan  of  c.  An  access  of  c 
involves  a  visit  to  each  of  the  cells  in  chain(c)  up  to  c.  We  distinguish  between  *  visit  to 
c  and  an  access  of  c  as  follows:  c  is  accessed  for  data,  c  is  visited  for  a  pointer  to  the  next 
cell  in  cham(c)  or  for  its  data  (thus  all  accesses  are  also  visits).  The  number  of  times  cells 
in  chain(c)  are  visited  when  searching  for  c  (for  access  or  deletion)  over  the  lifespan  of  c 
is  called  the  searchspan  of  c.  If  chain[c)  does  not  change  during  the  lifespan  of  c,  as  is  the 
case  for  traditional  hashing,  then  the  searchspan  of  c  is  the  length  of  chain(c)  at  times 
the  number  of  searches  for  c  over  its  lifespan. 

The  searchspan  of  c  can  be  reduced  by  reducing  the  length  of  chain(c)  dynamically. 
One  way  to  do  this  is  to  move  c  to  a  node  occupied  by  a  cell  in  chain[ c)  whenever  such 
a  cell  is  deleted  (and  the  node  becomes  uuoccupied).  However,  this  approach,  although 
successful  at  reducing  the  searchspan  of  c,  suffers  from  the  high  overhead  required  to  make 
dynamic  adjustments  to  chains  every  time  a  cell  is  deleted.  Another  approach,  with  less 
overhead,  is  to  move  c  toward  the  front  of  ckain(c)  only  when  c  is  searched.  This  is  called 
hashing  with  lazy  deletions.  Specifically,  hashing  with  lazy  deletions  is  the  result  of  using 
the  following  algorithm  to  access  any  cell  c: 


J 
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ACCBSS(c): 
t  ♦—  0 
i*-l 
repeal 

d  t-  h<uh,l(c) 

if  c  occupies  node  d  and  i  >  0  then  do  the  foiling: 

'‘access'1  cell  c 
move  c  lo  node  t 

otherwise,  if  c  occupies  node  d  then 
“access"  cell  c 

otherwise,  if  d  is  unoccupied  and  t  =  0  then 
1 1—  d 
i  *-  i  +  1 
until  c  is  accessed 

where  hath\ c)  is  the  itk  hash  function  applied  to  cell  c  and  the  output  of  hostile),  for 
any  *  >  1,  is  a  node  (number  from  1  to  n)  In  the  hash  table. 

Let  5(c)  (.4(c))  denote  the  searchspan  (accesspan)  of  c  and  let  5  (.^)  be  the  expectation 
of  5(c)  (-4(c))  over  c  (respectively).  We  wish  lo  find  5,  -J,  and  $/{A  +  1),  the  average 
number  of  visits  per  access  and  deletion  of  a  random  cell  c. 

The  model  used  in  the  analysis  is  now  described.  We  consider  an  arbitrarily  long 
sequence  of  insertions,  deletions,  and  accesses  in  the  table.  The  following  algorithm,  run 
repeatedly,  decides  the  outcome  of  each  ida  epoch: 

а.  Uniformly  choose  one  of  n  nodes  in  the  table.  If  we  choose  an  occupied  node  then  do 
step  6,  otherwise  do  step  c. 

б.  With  probability  pd,  delete  the  cell  occupying  the  chosen  node  (the  node  becomes 
unoe.upied);  otherwise  (with  probability  1  -  pi)  access  the  cell  occupying  the  chosen 
node. 

c.  With  probability  p,,  insert  a  cell  at  the  chosen  node  (the  node  becomes  occupied); 
otherwise  (with  probability  1  -  />,)  do  nothing  (no  ida  epoch  on  this  go-nround). 

We  choose  p,  and  pd  such  that,  over  a  long  sequence  of  nhi  epochs,  the  occupancy  of 
nodes  in  the  table  reaches  “equilibrium  and  the  average  number  of  occupied  nodes  in  the 
table  is  an.  In  this  case  we  say  the  table  is  a-hill.  A  table  is  in  equilibrium  only  if  the 
rate  at  which  deletions  occur  equals  the  rate  at  which  insertions  occur.  In  our  model,  if 
the  table  is  a-full,  the  rate  at  which  deletions  occur  is  apd  and  the  rale  at  which  insertions 
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occur  iu  i  i  -  a)pi  Thu#,  we  have 

1  -  o 

n  =  —n- 

It  is  eas<  to  obtain 

Theorem  12:  ([4]) 

•3  =  (t  -  Pi)/ Pi 

Our  main  result  is 
Theorem  13:  ([4]) 


5  -  V'  *  l/0(nW))*-1  +  l/0(ettv^))f?r(si  >  a.| sq  >  a;) 

~  1  -(l-  pd)pr(s i  >  *I#o  >  *.  A’j ) 


where 


r(,‘ - 11,5 -  *> = £  C  <  )a‘“"(1 " 


and  Qtu,m  i»  the  event  that  there  is  an  unoccupied  node  in  chain(cm)  on  the  next  »*»/« 
epoch  given  that  node  m  is  accessed  next. 


Although  this  theorem  looks  formidable,  it  is  actually  quite  useful  because  the  sums 
are  not  very  sensitive  to  pr{Qt}tim).  For  example,  in  one  extreme  cose,  with  pi~  l  (A  =  0), 
we  have 


1 

1  —  a’ 


In  the  other  extreme  case,  with  pi  tending  to  0  is  large),  we  have,  for  large  n, 


5  K  ,-M1  -  °.) 

<*P4 


and 

S  -ln(l-o) 

,1+1  *  a  * 

If  we  set  pr(Q„*(m)  =  0  in  Theorem  13  we  get  an  easy-to-compute  upper  bound  on  S  for  any 
value  of  p^  This  upper  bound  is  fairly  close  to  measurements  of  S  obtained  experimentally. 
The  number  of  visits  per  access  and  deletion  of  a  random  cell  using  Hashing  without  lazy 
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deletion!  is  1/(1  -  a).  Thus,  the  savings  in  number  of  visits  per  access  and  deletions  by 
using  lazy  deletions  when  a  table  is  nearly  full  and  cells  are  accessed  many  times  before 
they  are  deleted  is  considerable:  for  example,  if  a  =  .9  the  saving  is  about  75%,  if  a  =  .95 
the  saviug  is  about  85%,  and  if  a  —  .99  the  saving  is  about  96%. 

This  work  was  done  jointly  with  Pedro  Cclis,  recent  Ph.D.  from  the  University  of 
Waterloo  and  Assistant  Professor  of  Computer  Science  at  Indiana  University. 
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8.  Recent  Invited  Talks 


“Probability  in  Proof  Theory",  14th  Symposium  on  Operations  Research,  Ulm,  Ger* 
many  (September  8, 1989). 

"Analysis  of  Algorithms  for  Satisfiability  Problems",  Workshop  on  Boolean  Functions, 
Propositional  Logic,  and  At  Systems,  at  the  Research  institute  for  Applied 
Knowledge  Processing,  Ulm,  Germany  (September  4,  1989). 

"Lectures  in  Scheme:  Object  Oriented  Programming",  Research  Institute  for  Applied 
Knowledge  Processing,  Ulm,  Germany  (August  25, 1989). 

"Lectures  in  Scheme:  Extcnd-Syntax",  Research  Institute  for  Applied  Knowledge  Pro* 
cessing,  Ulm,  Germany  (August  18, 1989). 

"Lectures  in  Scheme:  Continuations  and  CaJl/cc",  Research  Institute  for  Applied 
Knowledge  Processing,  Ulm,  Germany  (August  11,  1989). 

"Probability  in  Proof  Theory"  at  The  Department  of  Statistics,  University  of  Rome, 

Rome,  Italy  (July  21, 1989). 

"Probability  in  Proof  Theory"  at  The  Department  of  Computer  Science,  University 
of  Milan,  Milan,  Italy  (July  17,  1989). 

"Probability  in  Proof  Theory"  at  The  Department  of  Computer  Science,  Universilal 
Dortmund,  Dortmund,  Germany  (July  11,  1989). 

"An  Overview  of  the  Scheme  Programming  Language”,  Research  Institute  for  Applied 
Knowledge  Processing,  Ulm,  Germany  (July  7,  1989). 

“Probability  in  Proof  Theory"  at  The  Seminar  for  Natural  Language  Processing  at 
University  of  Tubingen,  Tubingen,  Germany  (June  30,  1989). 

"Probabilistic  Analysis  of  Algorithms  for  VLSI  Testing  and  Design",  1989  CORS/T1MS/ORSA 
meeting,  Vancouver.  Canada  (May,  1989). 

“Probabilistic  Analysis  of  Algorithms  for  the  Satisfiability  Problem,”  at  Rutgers  Uni¬ 
versity,  New  Brunswick,  New  Jersey  (January  1989). 
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“Probabilistic  Analysis  of  Algorithms  for  the  Satisfiability  Problem,"  at  the  Work¬ 
shop  on  Mathematical  Method j  in  Artificial  Intelligence ,  Ultn,  West  Germany 
(December  1988). 

"Probabilistic  Analysis  of  Algorithms  for  CNF  Satisfiability,"  at  ATT  Dell  Laborato¬ 
ries,  Murray  Hill,  New  Jersey  (May  1988). 
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9.  Recent  Professional  Activities 


Invited  visit  to  the  FAW  (Research  Institute  for  Applied  Knowledge  Processing),  Him, 
VV.  Germany,  Summer  of  1989. 

Workshop  organizer,  Workshop  on  Boolean  Functions t  Propositional  Logic  and  A I 
Systems ,  Ulm,  W.  Germany,  September,  1989. 

Guest  Editor:  special  issue  of  Discrete  Applied  Mathematics  devoted  to  probabilistic 
aspects  of  connections  between  logic  and  combinatorics.  Targeted  for  appear¬ 
ance  in  1990. 

Session  chair:  CORS/TIMS/ORSA  meeting  of  1989,  May  8-10,  Vancouver,  Canada. 
Session  title  is  "Probabilistic  aspects  of  Boolean  Functions  in  Operations  Re¬ 
search." 

Reviewer  for  Journal  of  Ike  Association  for  Computing  Machinery ,  SIAM  Jorunal  on 
Computing,  Information  Sciences, Annals  of  Mathematics  and  Artificial  In¬ 
telligence,  Discrete  Applied  Mathematics,  Mathematical  Programming,  I  EKE 
Transactions  on  Computer  Aided  Design,  Annals  of  Discrete  Matk,  Combi¬ 
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